Evolutionary genomics of three agricultural pest moths reveals rapid evolution of host adaptation and immune-related genes

Abstract Background Understanding the genotype of pest species provides an important baseline for designing integrated pest management (IPM) strategies. Recently developed long-read sequence technologies make it possible to compare genomic features of nonmodel pest species to disclose the evolutionary path underlying the pest species profiles. Here we sequenced and assembled genomes for 3 agricultural pest gelechiid moths: Phthorimaea absoluta (tomato leafminer), Keiferia lycopersicella (tomato pinworm), and Scrobipalpa atriplicella (goosefoot groundling moth). We also compared genomes of tomato leafminer and tomato pinworm with published genomes of Phthorimaea operculella and Pectinophora gossypiella to investigate the gene family evolution related to the pest species profiles. Results We found that the 3 solanaceous feeding species, P. absoluta, K. lycopersicella, and P. operculella, are clustered together. Gene family evolution analyses with the 4 species show clear gene family expansions on host plant–associated genes for the 3 solanaceous feeding species. These genes are involved in host compound sensing (e.g., gustatory receptors), detoxification (e.g., ABC transporter C family, cytochrome P450, glucose-methanol-choline oxidoreductase, insect cuticle proteins, and UDP-glucuronosyl), and digestion (e.g., serine proteases and peptidase family S1). A gene ontology enrichment analysis of rapid evolving genes also suggests enriched functions in host sensing and immunity. Conclusions Our results of family evolution analyses indicate that host plant adaptation and pathogen defense could be important drivers in species diversification among gelechiid moths.

These species especially the two Phthorimaea species are found invading many non-native regions including Asia, Europe, and Africa.Research on these moths have focused largely on their host preference, identification, and management.Despite their importance as major global pests to agriculture, their genomic framework and the evolutionary process of host plant preference in insect pests is still poorly understood (but see [11]).
Host selection and host use in insects is determined by a series of physiological processes including host plant compound sensing, detoxification, and nutrient digestion.Several genes are thought to be involved in these processes that affect host selection [12].Genes associated with sensing phytocompounds include olfactory receptors (OR), gustatory receptors (GR), ionotropic receptors (IR), odorant-binding proteins (OBP), and chemosensory proteins (CSP).Genes associated with detoxification include cytochrome P450 (P450), ATP-binding cassette transporter (ABC), and glutathione S-transferases (GST), and genes associated with digestion include serine protease (SP) and beta-fructo-furanosidases (BFF) [13][14][15][16][17][18][19][20][21].A crucial question in understanding pest evolution is how these genes evolved among pest species and their relatives.Whole genome sequencing of pest species has shown great promise for revealing the evolutionary processes that led to the formation of a pestiferous species.For example, recent studies on the genomic evolution of agricultural pests, with subsequent comparative genomics analyses such as orthology, gene family evolution, selected region detections, and structural variant analyses, have identified putative genetic bases of their ecological features or pest species profiles [22][23][24][25].
Despite the diversity of gelechiid moths, the many studies on the impact of gelechiids to agriculture, and the release of nearly a thousand Lepidoptera genome assemblies in GenBank thus far [26], only a few gelechiid genome assemblies are publicly available [11,27,28].Considering its high species diversity and economic importance, more attention and efforts on genomic data accumulation and exploration are required for further understanding the evolution of this moth family.In this study, we sequenced and assembled the genomes of three gelechiid moth pests, Keiferia lycopersicella, Phthorimaea absoluta, and Scrobipalpa atriplicella to examine their genomic features and how they relate to host preference.Specifically, we investigate how rapidly evolving genes are correlated with host preference and life history.

Sample information and sequencing
Three gelechiid moth species (K.lycopersicella

Genome size and sequence coverage estimations
To verify read quality, we first assessed the HiFi sequence quality using FASTQC v 0.11.7 (RRID:SCR_014583) to summarize read profiles [29].The genome size and sequence coverage were estimated with two methods.First, we counted k-mers and calculated the k-mer density distribution for the HiFi reads using K-Mer Counter (KMC) v.3.2.1 (RRID:SCR_001245) with kmer size of 31 nucleotides.Density distributions were subsequently submitted to GENOMESCOPE v2.0 online tool (RRID:SCR_017014) [30] with default setting for diploid species to estimate the genome size, heterozygosity, sequence coverage and other genomic profiles (Supplementary Figure S1).Second, we mapped HiFi reads to final assemblies to estimate genome size and sequence coverage.This process was conducted in the program MODEST (backmap.plv 0.5) [31].
Estimated genome sizes and read coverages from GENOMESCOPE were used to certify autodetected estimates from HIFIASM assembler (see next section) to ensure the accuracy of autodetected assembling assumptions [32].

Genome assembly, quality assessment, and non-target sequence removal
We used HIFIASM v 0.16.1 (RRID:SCR_021069) to assemble the genome from HiFi reads using default settings, except for reads of P. absoluta, for which we applied a 2 (-l 2) purging level to keep a greater number of haplotigs for downstream purging.We kept more haplotigs because the sequence coverage for this species was low, and it generated the best assembly evaluated by N50 and BUSCO V 5.3.0 (RRID:SCR_015008) completeness (based on the lepidoptera_odb10 database) [33,34].We also applied the haplotig purging pipeline to remove duplicated haplotigs [35].For K. lycopersicella and S. atriplicella, we first mapped the HiFi reads to the assemblies with MINIMAP v. 2.21 (RRID:SCR_008103) and sorted using SAMTOOLS v 1.15 (RRID:SCR_002105) [36,47].The sorted mapped reads were subsequently used to draw the density distribution histogram of coverage using "hist" function in the purge_haplotigs pipeline (RRID:SCR_017616) [34].The histogram was used to identify the peaks of homozygous and heterozygous reads and the low point between the peaks.These values were then used to define the aggressiveness of the purging.Finally, we used low and high coverage cutoffs to purge the duplicated contigs.Other parameters were kept default as suggested by purge_haplotigs pipeline.
For P. absoluta, since its sequence depth is relatively low (see results), we used the Illumina short reads published by [28] to perform the purge_haplotigs.Specifically, we ran HIFIASM with less aggressive purging (l -2) to allow more duplicated haplotypes in the assembly for the haplotig purging pipeline and used the short-read coverage histogram to define the peaks.To identify potential non-target sequences in assemblies, we created blobplots using BLOBTOOLS (RRID:SCR_017618) to visualize the distribution of GC content and read coverage for contigs [37].To determine read coverage, we aligned HiFi reads to the assembly using MINIMAP2 [36].To assign taxonomy to reads, we used BLASTN (RRID:SCR_001598) to blast contigs against the NCBI nt database with an e-value cutoff of 1e-25.Contigs assigned to non-arthropods with deviating GC content and sequence coverage were determined to be non-target sequences and removed from assemblies (Supplementary Figure S2).A BUSCO score using lepidoptera_odb10 database was calculated to evaluate the completeness of each assembly (Table 1).Genome assemblies of the three species are available through NCBI (BioProject accession number: PRJNA932016).

Gene models and annotations
In the genome annotation pipeline, we first identified repeat regions using REPEATMODELER2 (RRID:SCR_015027) [38].The genome assemblies were soft-masked with repeats from three lines of evidence including simple and short repeats, the identified repeats from REPEATMODELER2, and the evidence from the lepidopteran repeat database in Repbase using REPEATMASKER (RRID:SCR_012954) with the blast tool RMBLAST (RRID:SCR_022710) [39,40].The BRAKER2 gene prediction pipeline (RRID:SCR_018964) was applied to soft-masked genomes [41][42][43][44][45][46][47].For K. lycopersicella and S. atriplicella, we used arthropod protein sequences from orthoDB (RRID:SCR_011980) (odb10_arthropoda) in the PROTHINT pipeline (RRID:SCR_021167) to generate hints to train GENEMARK-EP+ (RRID:SCR_011930) [48] and predict gene models alone with the AUGUSTUS (RRID:SCR_008417).For P. absoluta, we also included published RNA sequences to train the gene model [49].Specifically, we ran BRAKER2 pipeline twice (one with protein and one with RNA) and used TSEBRA [50] with default settings to integrate the two models.To further refine models for the three species, we removed genes identified solely by AUGUSTUS ab initio prediction without hint supports (e.g., introns, start and stop codons) from the protein database using the python script "selectSupportedSubsets.py" provided by BRAKER2.Final gene models were evaluated using a BUSCO protein model with the lepidoptera_odb10 database.Gene model profiles, including the monoexonic rate and sequence lengths of gene, intron, and exon, were summarized using GFACS v1.0.0 (RRID:SCR_022017) [51] (Supplemental Table S1).
For functional annotations, we first annotated gene function by blasting transcript sequences from the BRAKER2 pipeline to the RefSeq non-redundant protein database and Swiss-Prot arthropodan protein database (Reviewed UniPort database) using the blastp function in DIAMOND v2.0.9 (RRID:SCR_016071) [43].Additionally, we performed default INTERPROSCAN (RRID:SCR_005829) annotation which integrates 14 member databases including PFAM and PANTHER [52].For gene ontology terms (GO terms) and KEGG pathway annotations (RRID:SCR_012773), we queried transcript sequences to the PANNZER webserver (Protein annotation with z-score) [53] and KEGG automatic annotation server (KAAS) [54] with bidirectional best hits.

Phylogeny and Gene evolution
To explore the evolution of the three gelechiid moths and their genes, we created a phylogeny using two additional published genomes of Gelechiidae: Phthorimaea operculella and Pectinophora gossypiella.We used the genome of Hyposmocoma kahamanoa as an outgroup, as this species belongs to a moth family closely related to Gelechiidae (Cosmopterigidae) [55,56].
Published genome assembly of P. operculella (GCA_024500475.1)was downloaded from NCBI GenBank while those of P. gossypiella (GCF_024362695.1)and H. kahamanoa (GCF_003589595.1)were downloaded from the NCBI Reference Sequence (RefSeq) Database (O' Leary et al., 2016).We performed the same BUSCO approach using the lepidoptera_odb10 database to obtain compatible single-copy amino acid orthologs for these three species [32].The final data matrix contained 4,876 single copy orthologs that contained at least two ingroup species and the H. kahamanoa outgroup (385 orthologs did not fit these parameters and were removed).
Phylogeny of these five gelechiid moth species was constructed using the concatenated sequences from the BUSCO single copy gene alignments.We assigned a single substitution model (Q.insect+FO+G4 substitution model, the Q matrix estimated for insects) to the alignment and built a maximum likelihood tree in IQ-tree v 2.1.3(RRID:SCR_017254) [58][59][60][61].Branch supports were calculated using ultrafast bootstrap [62] and SH-aLRT [63][64].Since the genome assembly of S. atriplicella is less complete (see results), we also constructed phylogeny with the four gelechiid moth species (excluding S. atriplicella) and H. kahamanoa (outgroup) for gene family analysis (see below) to avoid the noise from the incomplete gene model of S. atriplicella with same tree building approach.
To investigate gene family evolution, we inferred an ultrametric tree from the concatenated sequence species tree using TREEPL with default settings [65].For gene family identification, we employed ORTHOFINDER v2.5.2 (RRID:SCR_017118) using the primary isoform of the annotated gene models from each of the five species (four gelechiid species and one outgroup) [66].Gene models of K. lycopersicella and P. absoluta were predicted from the BRAKER2 pipeline while the other three gene models were directly downloaded from appropriate databases.In ORTHOFINDER, we chose gene families as defined by phylogenetic hierarchical orthogroups (HOGs), an approach which is thought to be more accurate than similarity-based methods [66].For each gene family, the HOGs gene-counting matrix and ultrametric tree were used to estimate repertoire size changes in CAFE v 5.0.0 (RRID:SCR_018924) [67].We extracted HOGs under rapid repertoire size expansion and contraction, with the significance level set to 0.01 and branch lengths calculated from the ultrametric tree.For each gene associated with these HOGs, we used the top annotated function (lowest e-value) from INTERPROSCAN to represent the gene function.
For the HOGs with significant rapid expansion and contraction, we assessed their gene functions and GO terms using INTERPROSCAN annotations.To standardize annotations, we reannotated gene functions for the three downloaded gene models (Phthorimaea operculella, P. gossypiella, and H. kahamanoa) using default settings in INTERPROSCAN (Jones et al., 2014).For associated GO terms, we performed enrichment analysis using the R package TOPGO 2.40.0 (RRID:SCR_014798) [68] with a significance level of 0.05 for both fisher classic and weight01 algorithms.
After haplotig removal, considerable reductions in the number of contigs were found while assembled sizes and BUSCO completeness remained nearly consistent, indicating that smaller duplicated contigs were removed.From the assemblies of K. lycopersicella and S. atriplicella, we identified non-target sequences contributing to a small portion of the assemblies.In K. lycopersicella, a 10 kbp-contig was blasted to Streptophyta while in S. atriplicella, 15 small contigs (total 380 kbp) were blasted to Proteobacteria.After removing these non-target contigs, 443.65 Mb from 61 contigs, 652.7 Mb from 688 contigs, and 301.15 Mb from 7,092 contigs were found in the assemblies of K. lycopersicella, P. absoluta, and S. atriplicella, respectively.BUSCO scores for these assemblies are shown in Table 1.Estimated genome sizes from GENOMESCOPE for K. lycopersicella, P. absoluta, and S. atriplicella were 396, 514, and 276 million base pairs (Mb); much smaller than our assemblies and likely due to the exclusion of extremely high number of k-mers from the repeats in the genomes.Estimating genome size with MODEST resulted in 422, 1,040, and 344 Mbs with peak coverages at 49X, 10X, and 55X for K. lycopersicella, P. absoluta, and S. atriplicella, respectively (Supplemental Table S2).The estimated genome and assembly sizes of P. absoluta based on MODEST were nearly twice that of GENOMESCOPE, this discrepancy was likely due to different estimated sequence coverage.
For gene annotation, we first annotated repeats using REPEATMODELER2 [38] and P. absoluta showed the highest proportion of repeats (54.4%), followed by K. lycopersicella (48.22%) and S. atriplicella (32.83%).Soft-masked genomes were used to run the BRAKER2 pipeline with protein evidence for K. lycopersicella and S. atriplicella, resulting in 15,405 and 14,647 genes, respectively [42].For P. absoluta, we used both protein and RNA sequence evidence to predict the gene model.After removing genes without hint support, the gene model with 19,106 genes was used for functional annotation.BUSCO scores for these gene models reflect their assembly features, including the higher duplication rate in P. absoluta and higher missing rate in S. atriplicella (Table 1).

Phylogeny and gene family evolution
The maximum likelihood tree, derived from the concatenated supermatrix, shows that K. lycopersicella and P. operculella are most closely related (Figure 1).Phthorimaea absoluta, another species feeding on solanaceous hosts, is the sister species to K. lycopersicella and P. operculella.It is noteworthy that P. absoluta was previously and widely recognized as Tuta absoluta indicating a need of a comprehensive phylogenomic analysis on this group.Scrobipalpa atriplicella, an amaranthaceous feeder, is recovered as the sister taxon to the other three members of subfamily Gelechiinae in the six-species phylogeny (Supplementary Figure S3).Finally, Pectinophora gossypiella, a member of subfamily Apatetrinae, is the sister taxon to all four Gelechiinae species in the phylogeny, supporting the current taxonomic arrangement [69] (Figure 1).
We identified 14,384 HOGs in the protein sequences of the six species, and 809 HOGs were found to evolve rapidly at least in one branch along the ultrametric tree (Supplemental Table S3).Gene family evolution analyses showed a general pattern of rapid expansions at the tips of the tree and rapid contractions along internal branches (Figure 1).Specifically, 85 and 52 HOGs were identified in K. lycopersicella with rapid repertoire size expansion and contraction, respectively.Among these HOGs, 67 are annotated with gene functions where 5 are putatively involved in host plant adaptation (Glucose-methanol-choline oxidoreductase, Cytochrome P450 superfamily, insect cuticle proteins, and trypsin family serine proteases), 37 involved in immunity (PiggyBac transposable elements, Retrotransposon Pao-related genes, Serpin superfamilies, and Toll-like receptors (Supplemental Tables S4 and S5).
We also found that 68

Gene ontology enrichment analyses
From genes that were identified to be rapidly evolving, we found a handful of biological function terms that were enriched from the rapidly evolving genes.(Table 2).This includes sensory perception of taste (GO:0050909), toll-like receptor signaling pathway (GO:0002224), and immune response (GO:0006955), DNA integration (GO:0015074), and plasma membrane phospholipid scrambling (GO:0017121).All the enriched terms, including those terms passing through fisher classic but not weight01 threshold, are listed in Supplemental Table S6.

Genome assembly quality and its implications for gene family evolution
Long read sequencing technologies such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have provided a promising future for de novo assemblies of highquality genomes for non-model species [70,71].These recent advancements have the potential to significantly expand our understanding of the evolutionary mechanisms underlying plant-insect interactions and contribute to prevent future catastrophic crop damage.
In this study, we used HiFi long-read to assemble genomes for three gelechiid moth species (Table 1).Although BUSCO completeness of the S. atriplicella genome was relatively low (73.3%),3,246 of its BUSCO genes could be used to reconstruct a phylogeny with four other gelechiid species (Supplemental Figure S3).However, we note that the robustness of gene family evolution analyses relies heavily on the quality of the genome assembly and the subsequent gene model predictions.The incomplete genome assembly with higher heterozygosity or lower sequence coverage could confound the result by providing missing, incomplete, or duplicated gene prediction.Therefore, we used phylogeny with the four more complete genome assemblies and an outgroup to detect gene-family evolution with rapid repertoire size changes.We note that the assembly of P. absoluta has higher BUSCO duplication rate, likely the result of shallower sequence depth and higher heterozygosity, could possibly overestimate the gene copy number.
Although programs such as CAFE were designed to cope with such issues [72], the result of repertoire size changes for species with lower assembly completeness should be interpreted with some caution.
It should also be noted that sequencing interferences were encountered for S. atriplicella library samples, which were prepared together with those of P. absoluta and K. lycopersicella using the same DNA extraction, clean-up, library preparation, and sequencing protocols.We therefore sequenced the amplified DNA library with trial-and-error.According to the BUSCO completeness score, only part of the genome was covered by the HiFi reads despite the high sequence depth (37X and 55X from GENOMESCOPE and MODEST, respectively).This is likely due to replication bias or errors during the amplification process.Based on these experiences, we would recommend optimization of DNA extraction to enable native DNA to be sequenced over the whole genome amplification route we followed in with this challenging sample.
Finally, the different methods and assembled sizes resulted in considerable differences in estimated genome sizes.This discrepancy seems to be more prominent in taxa with lower sequence coverage and higher heterozygosity (e.g., P. absoluta).We therefore recommend using multiple approaches to better estimate the true genome size and understand the quality of the assembly.

Genomic adaption of the solanaceous feeding gelechiid moths
Moths use a combination of olfactory (smell) and gustatory (taste or contact) chemoreception to find oviposition sites.Olfactory and gustatory cues are often thought to function in long-and short-range detection of suitable hosts, respectively, but volatile cues at the host surface may also stimulate olfactory sensilla and determine oviposition choice in some species [73].Indeed, for P. absoluta, olfactory cues in the form of tomato leaf volatiles result in oviposition rates indistinguishable from to those involving direct contact with the leaf surface [74].This is not the case for K. lycopersicella and P. operculella where contact chemoreception appears to play a more important role in oviposition choice, with surface compounds of host plants shown to stimulate egg-laying in both species [75][76][77], and those of non-hosts shown to act as deterrents in P. operculella [76].It is notable that our gene family evolution analyses show an increase in rapidly evolving genes associated with host plant sensing, particularly gustatory receptors [78], coincident with a shift to solanaceous feeding in gelechiid moths.This could serve as an indication of selective pressure on host plant association through female oviposition or larval feeding choice.It is likely that caterpillars of leaf-rolling and leaf-mining species are confined to the plant where they hatch [79,80] and therefore do not search extensively for a new host plant, but the role of contact chemoreception in oviposition choice in gelechiids is better characterized than that of caterpillar host searching.
While an increase in host plant association genes correlates with a shift to feeding on Solanaceae, we do not observe a directional shift in terms of gains/losses.Thus, while we detected gains in gustatory receptor genes in P. absoluta and P. operculella, K. lycopersicella shows losses in this gene family.While a number of studies have shown a correlation between host range and chemosensory receptor gene repertoire size or specific losses [81][82][83], the gelechiids studied here appear to have experienced a host shift from one plant family (Amaranthaceae in S. atriplicella) to another (Solananceae in P. absoluta and its relatives), instead of an expansion or contraction of host range.Thus, we might not expect directional changes in chemosensory receptor repertoire size in instances of host shifts in the same manner that has been observed after expansion or contraction of host range.
For many lepidopteran species, detoxification of plant secondary metabolites is essential in host adaptation [84].Several genes, including ABC transporters, P450, GMC oxidoreductase, UGT, and insect cuticle protein play important roles in detoxifying the defending compounds from their host plants [85,86].Our gene family evolution analysis reveals that these detoxification genes also rapidly evolve in the focal gelechiid moths, while most expansions are found in the two solanaceous feeding species (i.e., P. absoluta and P. operculella) (Figure 1).It is noteworthy that K. lycopersicella, the sister species of P. operculella in our tree, shows only four detoxification genes expanding.This result may be explained partially by the different annotation pipelines that were used for these two genomes.However, since the gene model of K. lycopersicella covers 93.2% of the BUSCO single copy genes and CAFE was designed controlling such confounding factors from the incomplete or biased annotation, it is fair to conclude that detoxification gene expansion is not a general feature of solanaceous-feeding species [72].Interestingly, K. lycopersicella and P. absoluta are found to prefer tomato over potato while P. operculella feeds mainly on potato, implying that the feeding and oviposition preferences are not directly related to the evolution of detoxification genes.One other possible explanation is that the rapid expansion of detoxification genes on P. absoluta and P. operculella has resulted from the frequent exposure to pesticides, as these two species are well-known agriculture pests with many pesticide resistances reported [87][88][89].Although K. lycopersicella is also considered an agricultural pest, the damage it causes is not comparable to the two Phthorimaea species [7].Further studies using population genomic approaches to determine the relationship between detoxification-gene evolution and pesticide resistance might provide more evidence supporting or opposing this hypothesis.
One other important mechanism in host adaptation involves digesting nutrients from plant tissue.For many phytophagous insects, coping with host plant protein peptidase inhibitors and efficiently breaking down these complex molecules are an essential first step in digestion [20].By comparing genomes of four gelechiid species, we identified many serine protease genes which are known for its function of host-plant protein digestion.According to the gene evolution results, serine protease genes rapidly expanded in P. absoluta and P. operculella, and, P. gossypiella (Figure 1).These genes not only digest proteins, they also act as species-specific antagonists interfering with the function of host plant peptidase inhibitors [94][95][96].The expansion of trypsin and chymotrypsin in two global pests on solanaceous crops implies their underlying contributions to important pest species features such as shorter life spans relative to K. lycopersicella, a species that has fewer copies of these genes [7,[97][98][99][100].In general, our gene family evolution analysis reveals indirect but important signals of genome evolution underlying the host adaptation in these agricultural pests.

Rapid evolution of retrotransposable elements and other immune related genes in gelechiid moths
We found that many of the rapidly evolving genes present in all four gelechiid species are genes associated with retrotransposons and reverse transcription.For example, HOGs annotated with Pao, a retrotransposable element involved in antiviral mechanism, were found to be rapidly evolving in all four gelechiid species (Supplemental Table S3).This element usually contains five protein domains where reverse transcriptase (RTase), retrotransposon gag domain, aspartic protease (or aspartic peptidase), and Ribonuclease H superfamily (RNase H) are repeatedly found evolving rapidly [101,102].The RTase in this retrovirus-like element reverse-transcribes the invading virus RNA into DNA (stored in retrotransposon sequences or forming a viral circular DNA), and the infection is suppressed by RNase H through cleavage of the DNA-RNA hybrids or by the downstream RNAi pathway [103][104][105][106][107].Many other significant HOGs found in these gelechiid species may also have similar antiviral mechanisms, including PiggyBac transposable element, Ty3 transposon capsid-like protein, and Transposase, L1 [108,109] (Supplemental Table S3).However, rapid repertoire size changes of these retrotransposable elements could be the result of the transposon activity instead of gene copy accumulation through recombination.
We also found many rapidly evolving HOGs annotated with immune related genes such as those involved in the Toll-like receptor (TLC) pathway.These genes (e.g.Toll-like receptor, Leucine-rich repeat domain superfamily, and NF-kappa-B inhibitor-interacting Ras-like protein) were found with rapid size changes in all tested gelechiid species.Unlike retrotransposable elements, the TLC pathway targets a wider range of pathogens including bacteria, fungi, and viruses.Finally, many other genes that we identified have putative functions in immunity, including Serpin superfamily, Immunoglobulin, Pacifastin domain, and Gamma interferon inducible lysosomal thiol reductase GILT.The presence of many rapidly evolving, immune-related genes suggests that managing potential threats from pathogens is also a significant selection pressure.This finding is supported by comparative genomic studies on other moths where viral defending genes (RNase H, RTase, retrotransposon Pao, Toll-like receptor, Leucine-rich repeat domain) were identified to evolve rapidly [11,110].In sum, our gene family evolution approach highlights the importance of host adaptation and immune-related genes in these closely related gelechiid species.Table 1.Assembly statistics of the three newly sequenced gelechiid moth species, compared to statistics of the published Phthorimaea absoluta v1 assembly.BUSCO results from the Phthorimaea absoluta v1 assembly have been re-analyzed using BUSCO v5.

Phthorimaea absoluta v1
Phthorimaea  References supporting categorizations of gene function are provided in Supplemental Table S5.
Click here to access/download [NCBI:txid1511203], P. absoluta [NCBI:txid702717], S. atriplicella [NCBI:txid687131]) were collected from laboratory colonies at University of California, Davis, USA, Khumaltar, Lalitpur, Nepal, and the Saskatoon Research and Development Centre of Agriculture and Agri-Food, Canada, respectively.Genomic DNA from one moth of each species was extracted from the whole moth (larva) using the DNA isolation protocol of the OmniPrep Genomic DNA Extraction Kit (G-Biosciences, St. Louis, MO).For S. atriplicella, we encountered sequencing interference for several library samples.Therefore, we amplified genomic DNA with illustra™ GenomiPhi V2 DNA Amplification Kits, Cytiva and the amplified DNA was used to replace the native DNA extracted from the tissue.The genomic and amplified DNA samples were subsequently used to perform fragment size selection and sample purification with the DNeasy PowerClean CleanUp Kit before library preparation.Libraries were sequenced with a single SMRT cell in the Pacbio Sequel IIe system.The DNA clean-up, library construction, and sequencing steps were performed in the Interdisciplinary Center for Biotechnology Research (ICBR) at the University of Florida.The HiFi sequences are deposited in NCBI (BioProject accession number: PRJNA932016; SRA sample accession: SRR23497930, SRR23497929, and SRR23497928).

Figure 1 .
Figure 1.(left) Maximum likelihood tree of four gelechiid species from a concatenated supermatrix analysis of 4,876 single-copy genes, presented alongside a color-coded number of rapidly evolving gene families (red: expanding, blue: contracting).The tree is rooted with Hyposmocoma kahamanoa (Gelechioidea: Cosmopterigidae).Nodes are labelled with branch supports (ultrafast bootstrap/SH-aLRT).(right) The list of rapidly evolving gene families that are associated with host plants includes a host compound-sensing gene family, 20 detoxification genes, and seven digestion-related genes.Numbers in color-coded cells represent repertoire size change in corresponding branches on the tree in (left), and gene-family functions are shown at the top of columns.The significant repertoire size changes are marked with outside borders on the cell.

Table 2 .
Enriched GO terms from the rapidly evolving genes of the five gelechiid species in this study.